{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "automatic_differentiation.ipynb",
      "provenance": [],
      "collapsed_sections": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MDevcewD85Im"
      },
      "source": [
        "## Automatic Differentiation\n",
        "\n",
        "The [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation) is to calculate derivative of functions which is useful for algorithms such as [stochastic gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent).\n",
        "\n",
        "It's is particularly useful when we implement neural networks and desire to calculate differentiation of the output with respect to an input that are connected with a **chain of functions**:\n",
        "\n",
        "$L(x)=f(g(h(x)))$\n",
        "\n",
        "The differentiation is as below:\n",
        "\n",
        "$\\frac{dL}{dx} = \\frac{df}{dg}\\frac{dg}{dh}\\frac{dh}{dx}$\n",
        "\n",
        "The above rule is called the [chain rule](https://en.wikipedia.org/wiki/Chain_rule).\n",
        "\n",
        "So the [gradients](https://en.wikipedia.org/wiki/Gradient) needs to be calculated for ultimate derivative calculations.\n",
        "\n",
        "Let's see how TensorFlow does it!"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "iJoLp_aUvBFR"
      },
      "source": [
        "# Loading necessary libraries\n",
        "import tensorflow as tf\n",
        "import numpy as np"
      ],
      "execution_count": 1,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "w9fp4we9FWQU"
      },
      "source": [
        "### Introduction\n",
        "\n",
        "Some general information are useful to be addressed here:\n",
        "\n",
        "* To compute gradients, TensorFlow uses [tf.GradientTape](https://www.tensorflow.org/api_docs/python/tf/GradientTape) which records the operation for later being used for gradient computation.\n",
        "\n",
        "Let's have three similar example:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "26FVM1x8F-k0",
        "outputId": "556b8873-8aab-4525-eb58-30053330f99b",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 54
        }
      },
      "source": [
        "x = tf.constant([2.0])\n",
        "\n",
        "with tf.GradientTape(persistent=False, watch_accessed_variables=True) as grad:\n",
        "  f = x ** 2\n",
        "\n",
        "# Print gradient output\n",
        "print('The gradient df/dx where f=(x^2):\\n', grad.gradient(f, x))"
      ],
      "execution_count": 11,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "The gradient df/dx where f=(x^2):\n",
            " None\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "y_UO_8xnG6SY",
        "outputId": "fcc2f51b-8bf4-4ee3-c6a6-9180a7a94d90",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 54
        }
      },
      "source": [
        "x = tf.constant([2.0])\n",
        "x = tf.Variable(x)\n",
        "\n",
        "with tf.GradientTape(persistent=False, watch_accessed_variables=True) as grad:\n",
        "  f = x ** 2\n",
        "\n",
        "# Print gradient output\n",
        "print('The gradient df/dx where f=(x^2):\\n', grad.gradient(f, x))"
      ],
      "execution_count": 12,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "The gradient df/dx where f=(x^2):\n",
            " tf.Tensor([4.], shape=(1,), dtype=float32)\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "s2SxqhpeHab7",
        "outputId": "5c0be66e-171b-400b-8d55-d80fde11c7fd",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 54
        }
      },
      "source": [
        "x = tf.constant([2.0])\n",
        "\n",
        "with tf.GradientTape(persistent=False, watch_accessed_variables=True) as grad:\n",
        "  grad.watch(x)\n",
        "  f = x ** 2\n",
        "\n",
        "# Print gradient output\n",
        "print('The gradient df/dx where f=(x^2):\\n', grad.gradient(f, x))"
      ],
      "execution_count": 13,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "The gradient df/dx where f=(x^2):\n",
            " tf.Tensor([4.], shape=(1,), dtype=float32)\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7TptydLkH2hh"
      },
      "source": [
        "What's the difference between above examples?\n",
        "\n",
        "1. Using tf.Variable on top of the tensor to transform it into a [tf.Variable](https://www.tensorflow.org/guide/variable).\n",
        "2. Using [.watch()](https://www.tensorflow.org/api_docs/python/tf/GradientTape#watch) operation.\n",
        "\n",
        "The tf.Variable turn tensor to a variable tensor which is the recommended approach by TensorFlow. The .watch() method ensures the variable is being tracked by the tf.GradientTape(). \n",
        "\n",
        "**You can see if we use neither, we get NONE as the gradient which means gradients were not being tracked!**\n",
        "\n",
        "NOTE: In general it's always safe to work with variable as well as using .watch() to ensure tracking gradients.\n",
        "\n",
        "We used default arguments as:\n",
        "\n",
        "1. **persistent=False**: It says, any variable that is hold with tf.GradientTape(), after one calling of gradient will be released. \n",
        "2. **watch_accessed_variables=True**: By default watching variables. So if we have a variable, we do not need to use .watch() with this default setting.\n",
        "\n",
        "Let's have an example with **persistent=True**:\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "V-ugyJGRHc7S",
        "outputId": "a4813975-128b-4ad2-ad35-5c7270060ed1",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 90
        }
      },
      "source": [
        "x = tf.constant([2.0])\n",
        "x = tf.Variable(x)\n",
        "\n",
        "# For practice, turn persistent to False to see what happens.\n",
        "with tf.GradientTape(persistent=True, watch_accessed_variables=True) as grad:\n",
        "  f = x ** 2\n",
        "  h = x ** 3\n",
        "\n",
        "# Print gradient output\n",
        "print('The gradient df/dx where f=(x^2):\\n', grad.gradient(f, x))\n",
        "print('The gradient dh/dx where h=(x^3):\\n', grad.gradient(h, x))"
      ],
      "execution_count": 16,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "The gradient df/dx where f=(x^2):\n",
            " tf.Tensor([4.], shape=(1,), dtype=float32)\n",
            "The gradient dh/dx where h=(x^3):\n",
            " tf.Tensor([12.], shape=(1,), dtype=float32)\n"
          ],
          "name": "stdout"
        }
      ]
    }
  ]
}